Guangwei CONG Noritsugu YAMAMOTO Takashi INOUE Yuriko MAEGAMI Morifumi OHNO Shota KITA Rai KOU Shu NAMIKI Koji YAMADA
Wide deployment of artificial intelligence (AI) is inducing exponentially growing energy consumption. Traditional digital platforms are becoming difficult to fulfill such ever-growing demands on energy efficiency as well as computing latency, which necessitates the development of high efficiency analog hardware platforms for AI. Recently, optical and electrooptic hybrid computing is reactivated as a promising analog hardware alternative because it can accelerate the information processing in an energy-efficient way. Integrated photonic circuits offer such an analog hardware solution for implementing photonic AI and machine learning. For this purpose, we proposed a photonic analog of support vector machine and experimentally demonstrated low-latency and low-energy classification computing, which evidences the latency and energy advantages of optical analog computing over traditional digital computing. We also proposed an electrooptic Hopfield network for classifying and recognizing time-series data. This paper will review our work on implementing classification computing and Hopfield network by leveraging silicon photonic circuits.
Qingping YU You ZHANG Zhiping SHI Xingwang LI Longye WANG Ming ZENG
In this letter, a deep neural network (DNN) aided joint source-channel (JSCC) decoding scheme is proposed for polar codes. In the proposed scheme, an integrated factor graph with an unfolded structure is first designed. Then a DNN aided flooding belief propagation decoding (FBP) algorithm is proposed based on the integrated factor, in which both source and channel scaling parameters in the BP decoding are optimized for better performance. Experimental results show that, with the proposed DNN aided FBP decoder, the polar coded JSCC scheme can have about 2-2.5 dB gain over different source statistics p with source message length NSC = 128 and 0.2-1 dB gain over different source statistics p with source message length NSC = 512 over the polar coded JSCC system with existing BP decoder.
Kai IKUTA Jinya NAKAMURA Moriya NAKAMURA
In this paper, we investigated the overfitting characteristics of nonlinear equalizers based on an artificial neural network (ANN) and the Volterra series transfer function (VSTF), which were designed to compensate for optical nonlinear waveform distortion in optical fiber communication systems. Linear waveform distortion caused by, e.g., chromatic dispersion (CD) is commonly compensated by linear equalizers using digital signal processing (DSP) in digital coherent receivers. However, mitigation of nonlinear waveform distortion is considered to be one of the next important issues. An ANN-based nonlinear equalizer is one possible candidate for solving this problem. However, the risk of overfitting of ANNs is one obstacle in using the technology in practical applications. We evaluated and compared the overfitting of ANN- and conventional VSTF-based nonlinear equalizers used to compensate for optical nonlinear distortion. The equalizers were trained on repeated random bit sequences (RRBSs), while varying the length of the bit sequences. When the number of hidden-layer units of the ANN was as large as 100 or 1000, the overfitting characteristics were comparable to those of the VSTF. However, when the number of hidden-layer units was 10, which is usually enough to compensate for optical nonlinear distortion, the overfitting was weaker than that of the VSTF. Furthermore, we confirmed that even commonly used finite impulse response (FIR) filters showed overfitting to the RRBS when the length of the RRBS was equal to or shorter than the length of the tapped delay line of the filters. Conversely, when the RRBS used for the training was sufficiently longer than the tapped delay line, the overfitting could be suppressed, even when using an ANN-based nonlinear equalizer with 10 hidden-layer units.
Despite recent advancements in utilizing meta-learning for addressing the generalization challenges of graph neural networks (GNN), their performance in argumentation mining tasks, such as argument classifications, remains relatively limited. This is primarily due to the under-utilization of potential pattern knowledge intrinsic to argumentation structures. To address this issue, our study proposes a two-stage, pattern-based meta-GNN method in contrast to conventional pattern-free meta-GNN approaches. Initially, our method focuses on learning a high-level pattern representation to effectively capture the pattern knowledge within an argumentation structure and then predicts edge types. It then utilizes a meta-learning framework in the second stage, designed to train a meta-learner based on the predicted edge types. This feature allows for rapid generalization to novel argumentation graphs. Through experiments on real English discussion datasets spanning diverse topics, our results demonstrate that our proposed method substantially outperforms conventional pattern-free GNN approaches, signifying a significant stride forward in this domain.
Shingo YASHIKI Chako TAKAHASHI Koutarou SUZUKI
This paper investigates the effects of backdoor attacks on graph neural networks (GNNs) trained through simple data augmentation by modifying the edges of the graph in graph classification. The numerical results show that GNNs trained with data augmentation remain vulnerable to backdoor attacks and may even be more vulnerable to such attacks than GNNs without data augmentation.
Tetsuo KOSAKA Kazuya SAEKI Yoshitaka AIZAWA Masaharu KATO Takashi NOSE
Emotional speech recognition is generally considered more difficult than non-emotional speech recognition. The acoustic characteristics of emotional speech differ from those of non-emotional speech. Additionally, acoustic characteristics vary significantly depending on the type and intensity of emotions. Regarding linguistic features, emotional and colloquial expressions are also observed in their utterances. To solve these problems, we aim to improve recognition performance by adapting acoustic and language models to emotional speech. We used Japanese Twitter-based Emotional Speech (JTES) as an emotional speech corpus. This corpus consisted of tweets and had an emotional label assigned to each utterance. Corpus adaptation is possible using the utterances contained in this corpus. However, regarding the language model, the amount of adaptation data is insufficient. To solve this problem, we propose an adaptation of the language model by using online tweet data downloaded from the internet. The sentences used for adaptation were extracted from the tweet data based on certain rules. We extracted the data of 25.86 M words and used them for adaptation. In the recognition experiments, the baseline word error rate was 36.11%, whereas that with the acoustic and language model adaptation was 17.77%. The results demonstrated the effectiveness of the proposed method.
Ren TAKEUCHI Rikima MITSUHASHI Masakatsu NISHIGAKI Tetsushi OHKI
The war between cyber attackers and security analysts is gradually intensifying. Owing to the ease of obtaining and creating support tools, recent malware continues to diversify into variants and new species. This increases the burden on security analysts and hinders quick analysis. Identifying malware families is crucial for efficiently analyzing diversified malware; thus, numerous low-cost, general-purpose, deep-learning-based classification techniques have been proposed in recent years. Among these methods, malware images that represent binary features as images are often used. However, no models or architectures specific to malware classification have been proposed in previous studies. Herein, we conduct a detailed analysis of the behavior and structure of malware and focus on PE sections that capture the unique characteristics of malware. First, we validate the features of each PE section that can distinguish malware families. Then, we identify PE sections that contain adequate features to classify families. Further, we propose an ensemble learning-based classification method that combines features of highly discriminative PE sections to improve classification accuracy. The validation of two datasets confirms that the proposed method improves accuracy over the baseline, thereby emphasizing its importance.
In the field of machine learning security, as one of the attack surfaces especially for edge devices, the application of side-channel analysis such as correlation power/electromagnetic analysis (CPA/CEMA) is expanding. Aiming to evaluate the leakage resistance of neural network (NN) model parameters, i.e. weights and biases, we conducted a feasibility study of CPA/CEMA on floating-point (FP) operations, which are the basic operations of NNs. This paper proposes approaches to recover weights and biases using CPA/CEMA on multiplication and addition operations, respectively. It is essential to take into account the characteristics of the IEEE 754 representation in order to realize the recovery with high precision and efficiency. We show that CPA/CEMA on FP operations requires different approaches than traditional CPA/CEMA on cryptographic implementations such as the AES.
Jie LUO Chengwan HE Hongwei LUO
Text classification is a fundamental task in natural language processing, which finds extensive applications in various domains, such as spam detection and sentiment analysis. Syntactic information can be effectively utilized to improve the performance of neural network models in understanding the semantics of text. The Chinese text exhibits a high degree of syntactic complexity, with individual words often possessing multiple parts of speech. In this paper, we propose BRsyn-caps, a capsule network-based Chinese text classification model that leverages both Bert and dependency syntax. Our proposed approach integrates semantic information through Bert pre-training model for obtaining word representations, extracts contextual information through Long Short-term memory neural network (LSTM), encodes syntactic dependency trees through graph attention neural network, and utilizes capsule network to effectively integrate features for text classification. Additionally, we propose a character-level syntactic dependency tree adjacency matrix construction algorithm, which can introduce syntactic information into character-level representation. Experiments on five datasets demonstrate that BRsyn-caps can effectively integrate semantic, sequential, and syntactic information in text, proving the effectiveness of our proposed method for Chinese text classification.
While deep image compression performs better than traditional codecs like JPEG on natural images, it faces a challenge as a learning-based approach: compression performance drastically decreases for out-of-domain images. To investigate this problem, we introduce a novel task that we call universal deep image compression, which involves compressing images in arbitrary domains, such as natural images, line drawings, and comics. Furthermore, we propose a content-adaptive optimization framework to tackle this task. This framework adapts a pre-trained compression model to each target image during testing for addressing the domain gap between pre-training and testing. For each input image, we insert adapters into the decoder of the model and optimize the latent representation extracted by the encoder and the adapter parameters in terms of rate-distortion, with the adapter parameters transmitted per image. To achieve the evaluation of the proposed universal deep compression, we constructed a benchmark dataset containing uncompressed images of four domains: natural images, line drawings, comics, and vector arts. We compare our proposed method with non-adaptive and existing adaptive compression methods, and the results show that our method outperforms them. Our code and dataset are publicly available at https://github.com/kktsubota/universal-dic.
Acoustic scene classification (ASC) is a fundamental domain within the realm of artificial intelligence classification tasks. ASC-based tasks commonly employ models based on convolutional neural networks (CNNs) that utilize log-Mel spectrograms as input for gathering acoustic features. In this paper, we designed a CNN-based multi-scale pooling (MSP) strategy for ASC. The log-Mel spectrograms are utilized as the input to CNN, which is partitioned into four frequency axis segments. Furthermore, we devised four CNN channels to acquire inputs from distinct frequency ranges. The high-level features extracted from outputs in various frequency bands are integrated through frequency pyramid average pooling layers at multiple levels. Subsequently, a softmax classifier is employed to classify different scenes. Our study demonstrates that the implementation of our designed model leads to a significant enhancement in the model's performance, as evidenced by the testing of two acoustic datasets.
We investigated the influence of horizontal shifts of the input images for one stage object detection method. We found that the object detector class scores drop when the target object center is at the grid boundary. Many approaches have focused on reducing the aliasing effect of down-sampling to achieve shift-invariance. However, down-sampling does not completely solve this problem at the grid boundary; it is necessary to suppress the dispersion of features in pixels close to the grid boundary into adjacent grid cells. Therefore, this paper proposes two approaches focused on the grid boundary to improve this weak point of current object detection methods. One is the Sub-Grid Feature Extraction Module, in which the sub-grid features are added to the input of the classification head. The other is Grid-Aware Data Augmentation, where augmented data are generated by the grid-level shifts and are used in training. The effectiveness of the proposed approaches is demonstrated using the COCO validation set after applying the proposed method to the FCOS architecture.
Jinsoo SEO Junghyun KIM Hyemi KIM
Song-level feature summarization is fundamental for the browsing, retrieval, and indexing of digital music archives. This study proposes a deep neural network model, CQTXNet, for extracting song-level feature summary for cover song identification. CQTXNet incorporates depth-wise separable convolution, residual network connections, and attention models to extend previous approaches. An experimental evaluation of the proposed CQTXNet was performed on two publicly available cover song datasets by varying the number of network layers and the type of attention modules.
Le Trieu PHONG Tran Thi PHUONG Lihua WANG Seiichi OZAWA
In this paper, we explore privacy-preserving techniques in federated learning, including those can be used with both neural networks and decision trees. We begin by identifying how information can be leaked in federated learning, after which we present methods to address this issue by introducing two privacy-preserving frameworks that encompass many existing privacy-preserving federated learning (PPFL) systems. Through experiments with publicly available financial, medical, and Internet of Things datasets, we demonstrate the effectiveness of privacy-preserving federated learning and its potential to develop highly accurate, secure, and privacy-preserving machine learning systems in real-world scenarios. The findings highlight the importance of considering privacy in the design and implementation of federated learning systems and suggest that privacy-preserving techniques are essential in enabling the development of effective and practical machine learning systems.
Aorui GOU Jingjing LIU Xiaoxiang CHEN Xiaoyang ZENG Yibo FAN
Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable performance in detection and classification tasks. Nevertheless, their feature extraction cannot consider both local and global information, so the detection and classification performance can be further improved. In addition, more and more deep learning networks are designed as more and more complex, and the amount of computation and storage space required is also significantly increased. This paper proposes a combination of CNN and transformer, and designs a local feature enhancement module and global context modeling module to enhance the cascade network. While the local feature enhancement module increases the range of feature extraction, the global context modeling is used to capture the feature maps' global information. To decrease the model complexity, a shared sublayer is designed to realize the sharing of weight parameters between the adjacent convolutional layers or cross convolutional layers, thereby reducing the number of convolutional weight parameters. Moreover, to effectively improve the detection performance of neural networks without increasing network parameters, the optimal transport assignment approach is proposed to resolve the problem of label assignment. The classification loss and regression loss are the summations of the cost between the demander and supplier. The experiment results demonstrate that the proposed Combination of CNN and Transformer with Shared Sublayer (CCTSS) performs better than the state-of-the-art methods in various datasets and applications.
Daniel Akira ANDO Yuya KASE Toshihiko NISHIMURA Takanori SATO Takeo OHGANE Yasutaka OGAWA Junichiro HAGIWARA
Direction of arrival (DOA) estimation is an antenna array signal processing technique used in, for instance, radar and sonar systems, source localization, and channel state information retrieval. As new applications and use cases appear with the development of next generation mobile communications systems, DOA estimation performance must be continually increased in order to support the nonstop growing demand for wireless technologies. In previous works, we verified that a deep neural network (DNN) trained offline is a strong candidate tool with the promise of achieving great on-grid DOA estimation performance, even compared to traditional algorithms. In this paper, we propose new techniques for further DOA estimation accuracy enhancement incorporating signal-to-noise ratio (SNR) prediction and an end-to-end DOA estimation system, which consists of three components: source number estimator, DOA angular spectrum grid estimator, and DOA detector. Here, we expand the performance of the DOA detector and angular spectrum estimator, and present a new solution for source number estimation based on DNN with very simple design. The proposed DNN system applied with said enhancement techniques has shown great estimation performance regarding the success rate metric for the case of two radio wave sources although not fully satisfactory results are obtained for the case of three sources.
Peiqi ZHANG Shinya TAKAMAEDA-YAMAZAKI
Binary Neural Networks (BNN) have binarized neuron and connection values so that their accelerators can be realized by extremely efficient hardware. However, there is a significant accuracy gap between BNNs and networks with wider bit-width. Conventional BNNs binarize feature maps by static globally-unified thresholds, which makes the produced bipolar image lose local details. This paper proposes a multi-input activation function to enable adaptive thresholding for binarizing feature maps: (a) At the algorithm level, instead of operating each input pixel independently, adaptive thresholding dynamically changes the threshold according to surrounding pixels of the target pixel. When optimizing weights, adaptive thresholding is equivalent to an accompanied depth-wise convolution between normal convolution and binarization. Accompanied weights in the depth-wise filters are ternarized and optimized end-to-end. (b) At the hardware level, adaptive thresholding is realized through a multi-input activation function, which is compatible with common accelerator architectures. Compact activation hardware with only one extra accumulator is devised. By equipping the proposed method on FPGA, 4.1% accuracy improvement is achieved on the original BNN with only 1.1% extra LUT resource. Compared with State-of-the-art methods, the proposed idea further increases network accuracy by 0.8% on the Cifar-10 dataset and 0.4% on the ImageNet dataset.
For massive multiple-input multiple-output (MIMO) communication systems, simple linear detectors such as zero forcing (ZF) and minimum mean square error (MMSE) can achieve near-optimal detection performance with reduced computational complexity. However, such linear detectors always involve complicated matrix inversion, which will suffer from high computational overhead in the practical implementation. Due to the massive parallel-processing and efficient hardware-implementation nature, the neural network has become a promising approach to signal processing for the future wireless communications. In this paper, we first propose an efficient neural network to calculate the pseudo-inverses for any type of matrices based on the improved Newton's method, termed as the PINN. Through detailed analysis and derivation, the linear massive MIMO detectors are mapped on PINNs, which can take full advantage of the research achievements of neural networks in both algorithms and hardwares. Furthermore, an improved limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton method is studied as the learning algorithm of PINNs to achieve a better performance/complexity trade-off. Simulation results finally validate the efficiency of the proposed scheme.
In this paper, we describe a wavelength-division multiplexing visible-light communication (VLC) system using two colored light-emitting diodes (LEDs) with similar emission wavelengths. A multi-input multi-output signal-separation method using a neural network is proposed to cancel the optical cross chatter caused by the spectral overlap of LEDs. The experimental results demonstrate that signal separation using neural networks can be achieved in wavelength-multiplexed VLC systems with a bit error rate of less than 3.8×10-3 (forward error correction limit). Furthermore, the simulation results reveal that the carrier-to-noise ratio (CNR) is improved by 2dB for the successive interference canceller (SIC) compared to the zero-forcing method.
Existing simple routing protocols (e.g., OSPF, RIP) have some disadvantages of being inflexible and prone to congestion due to the concentration of packets on particular routers. To address these issues, packet routing methods using machine learning have been proposed recently. Compared to these algorithms, machine learning based methods can choose a routing path intelligently by learning efficient routes. However, machine learning based methods have a disadvantage of training time overhead. We thus focus on a lightweight machine learning algorithm, OS-ELM (Online Sequential Extreme Learning Machine), to reduce the training time. Although previous work on reinforcement learning using OS-ELM exists, it has a problem of low learning accuracy. In this paper, we propose OS-ELM QN (Q-Network) with a prioritized experience replay buffer to improve the learning performance. It is compared to a deep reinforcement learning based packet routing method using a network simulator. Experimental results show that introducing the experience replay buffer improves the learning performance. OS-ELM QN achieves a 2.33 times speedup than a DQN (Deep Q-Network) in terms of learning speed. Regarding the packet transfer latency, OS-ELM QN is comparable or slightly inferior to the DQN while they are better than OSPF in most cases since they can distribute congestions.